Da Nang
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (4 more...)
- North America > United States > Washington > King County > Seattle (0.04)
- Asia > Vietnam > Da Nang > Da Nang (0.04)
- Africa > Rwanda (0.04)
Data-Aware and Scalable Sensitivity Analysis for Decision Tree Ensembles
Varshney, Namrita, Gupta, Ashutosh, Ahmad, Arhaan, Tayal, Tanay V., Akshay, S.
Decision tree ensembles are widely used in critical domains, making robustness and sensitivity analysis essential to their trustworthiness. We study the feature sensitivity problem, which asks whether an ensemble is sensitive to a specified subset of features -- such as protected attributes -- whose manipulation can alter model predictions. Existing approaches often yield examples of sensitivity that lie far from the training distribution, limiting their interpretability and practical value. We propose a data-aware sensitivity framework that constrains the sensitive examples to remain close to the dataset, thereby producing realistic and interpretable evidence of model weaknesses. To this end, we develop novel techniques for data-aware search using a combination of mixed-integer linear programming (MILP) and satisfiability modulo theories (SMT) encodings. Our contributions are fourfold. First, we strengthen the NP-hardness result for sensitivity verification, showing it holds even for trees of depth 1. Second, we develop MILP-optimizations that significantly speed up sensitivity verification for single ensembles and for the first time can also handle multiclass tree ensembles. Third, we introduce a data-aware framework generating realistic examples close to the training distribution. Finally, we conduct an extensive experimental evaluation on large tree ensembles, demonstrating scalability to ensembles with up to 800 trees of depth 8, achieving substantial improvements over the state of the art. This framework provides a practical foundation for analyzing the reliability and fairness of tree-based models in high-stakes applications.
- North America > United States > New York > New York County > New York City (0.14)
- North America > Puerto Rico (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- (16 more...)
- Banking & Finance (0.67)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.68)
- North America > Canada > Quebec > Montreal (0.24)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Contrastive Integrated Gradients: A Feature Attribution-Based Method for Explaining Whole Slide Image Classification
Vu, Anh Mai, Vo, Tuan L., Bui, Ngoc Lam Quang, Binh, Nam Nguyen Le, Awasthi, Akash, Vo, Huy Quoc, Nguyen, Thanh-Huy, Han, Zhu, Mohan, Chandra, Van Nguyen, Hien
Interpretability is essential in Whole Slide Image (WSI) analysis for computational pathology, where understanding model predictions helps build trust in AI-assisted diagnostics. While Integrated Gradients (IG) and related attribution methods have shown promise, applying them directly to WSIs introduces challenges due to their high-resolution nature. These methods capture model decision patterns but may overlook class-discriminative signals that are crucial for distinguishing between tumor subtypes. In this work, we introduce Contrastive Integrated Gradients (CIG), a novel attribution method that enhances interpretability by computing contrastive gradients in logit space. First, CIG highlights class-discriminative regions by comparing feature importance relative to a reference class, offering sharper differentiation between tumor and non-tumor areas. Second, CIG satisfies the axioms of integrated attribution, ensuring consistency and theoretical soundness. Third, we propose two attribution quality metrics, MIL-AIC and MIL-SIC, which measure how predictive information and model confidence evolve with access to salient regions, particularly under weak supervision.
- North America > United States > Texas > Harris County > Houston (0.04)
- Europe > Italy > Marche > Ancona Province > Ancona (0.04)
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
- Asia > Vietnam > Da Nang > Da Nang (0.04)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Diagnostic Medicine (0.68)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.83)
- Information Technology > Artificial Intelligence > Vision (0.66)
IBMA: An Imputation-Based Mixup Augmentation Using Self-Supervised Learning for Time Series Data
Nguyen, Dang Nha, Nguyen, Hai Dang, Nguyen, Khoa Tho Anh
Data augmentation plays a crucial role in enhancing model performance across various AI fields by introducing variability while maintaining the underlying temporal patterns. However, in the context of long sequence time series data, where maintaining temporal consistency is critical, there are fewer augmentation strategies compared to fields such as image or text, with advanced techniques like Mixup rarely being used. In this work, we propose a new approach, Imputation-based Mixup Augmentation (IMA), which combines Imputed-data Augmentation with Mixup Augmentation to bolster model generalization and improve forecasting performance. We evaluate the effectiveness of this method across several forecasting models, including DLinear (MLP), TimesNet (CNN), and iTrainformer (Transformer), these models represent some of the most recent advances in long sequence time series forecasting. Our experiments, conducted on three datasets (ETT -small, Illness, Exchange Rate) from various domains and compared against eight other augmentation techniques, demonstrate that IMA consistently enhances performance, achieving 22 improvements out of 24 instances, with 10 of those being the best performances, particularly with iTrain-former imputation in ETT dataset. The GitHub repository is available at: https://github.com/dangnha/IMA.
- North America > United States (0.04)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation
Le-Duc, Khai, Tran, Tuyen, Tat, Bach Phan, Bui, Nguyen Kim Hai, Dang, Quan, Tran, Hung-Phong, Nguyen, Thanh-Thuy, Nguyen, Ly, Phan, Tuan-Minh, Tran, Thi Thu Phuong, Ngo, Chris, Khanh, Nguyen X., Nguyen-Tang, Thanh
Multilingual speech translation (ST) and machine translation (MT) in the medical domain enhances patient care by enabling efficient communication across language barriers, alleviating specialized workforce shortages, and facilitating improved diagnosis and treatment, particularly during pandemics. In this work, we present the first systematic study on medical ST, to our best knowledge, by releasing MultiMed-ST, a large-scale ST dataset for the medical domain, spanning all translation directions in five languages: Vietnamese, English, German, French, and Simplified/Traditional Chinese, together with the models. With 290,000 samples, this is the largest medical MT dataset and the largest many-to-many multilingual ST among all domains. Secondly, we present the most comprehensive ST analysis in the field's history, to our best knowledge, including: empirical baselines, bilingual-multilingual comparative study, end-to-end vs. cascaded comparative study, task-specific vs. multi-task sequence-to-sequence comparative study, code-switch analysis, and quantitative-qualitative error analysis. All code, data, and models are available online: https://github.com/leduckhai/MultiMed-ST
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.13)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (28 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area (1.00)
- Government (1.00)
- (2 more...)
Deep Feature Optimization for Enhanced Fish Freshness Assessment
Hoang, Phi-Hung, Trinh, Nam-Thuan, Tran, Van-Manh, Phan, Thi-Thu-Hong
Assessing fish freshness is vital for ensuring food safety and minimizing economic losses in the seafood industry. However, traditional sensory evaluation remains subjective, time-consuming, and inconsistent. Although recent advances in deep learning have automated visual freshness prediction, challenges related to accuracy and feature transparency persist. This study introduces a unified three-stage framework that refines and leverages deep visual representations for reliable fish freshness assessment. First, five state-of-the-art vision architectures - ResNet-50, DenseNet-121, EfficientNet-B0, ConvNeXt-Base, and Swin-Tiny - are fine-tuned to establish a strong baseline. Next, multi-level deep features extracted from these backbones are used to train seven classical machine learning classifiers, integrating deep and traditional decision mechanisms. Finally, feature selection methods based on Light Gradient Boosting Machine (LGBM), Random Forest, and Lasso identify a compact and informative subset of features. Experiments on the Freshness of the Fish Eyes (FFE) dataset demonstrate that the best configuration combining Swin-Tiny features, an Extra Trees classifier, and LGBM-based feature selection achieves an accuracy of 85.99%, outperforming recent studies on the same dataset by 8.69-22.78%. These findings confirm the effectiveness and generalizability of the proposed framework for visual quality evaluation tasks.
- Europe > Switzerland (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Vietnam > Da Nang > Da Nang (0.04)
- Health & Medicine (1.00)
- Food & Agriculture > Fishing (1.00)
Pruning and Quantization Impact on Graph Neural Networks
Khedri, Khatoon, Rawassizadeh, Reza, Wen, Qifu, Hosseinzadeh, Mehdi
Graph neural networks (GNNs) are known to operate with high accuracy on learning from graph-structured data, but they suffer from high computational and resource costs. Neural network compression methods are used to reduce the model size while maintaining reasonable accuracy. Two of the common neural network compression techniques include pruning and quantization. In this research, we empirically examine the effects of three pruning methods and three quantization methods on different GNN models, including graph classification tasks, node classification tasks, and link prediction. We conducted all experiments on three graph datasets, including Cora, Proteins, and BBBP. Our findings demonstrate that unstructured fine-grained and global pruning can significantly reduce the model's size(50\%) while maintaining or even improving precision after fine-tuning the pruned model. The evaluation of different quantization methods on GNN shows diverse impacts on accuracy, inference time, and model size across different datasets.
- Information Technology (1.00)
- Health & Medicine (1.00)
High-Precision Climbing Robot Localization Using Planar Array UWB/GPS/IMU/Barometer Integration
Zhang, Shuning, Zhu, Zhanchen, Chen, Xiangyu, Wang, Yunheng, Jiang, Xu, Duan, Peibo, Xu, Renjing
Abstract-- T o address the need for high-precision localization of climbing robots in complex high-altitude environments, this paper proposes a multi-sensor fusion system that overcomes the limitations of single-sensor approaches. Firstly, the localization scenarios and the problem model are analyzed. An integrated architecture of Attention Mechanism-based Fusion Algorithm (AMF A) incorporating planar array Ultra-Wideband (UWB), GPS, Inertial Measurement Unit (IMU), and barometer is designed to handle challenges such as GPS occlusion and UWB Non-Line-of-Sight (NLOS) problem. Then, End-to-end neural network inference models for UWB and barometer are developed, along with a multimodal attention mechanism for adaptive data fusion. An Unscented Kalman Filter (UKF) is applied to refine the trajectory, improving accuracy and robustness. Finally, real-world experiments show that the method achieves 0.48 m localization accuracy and lower MAX error of 1.50 m, outperforming baseline algorithms such as GPS/INS-EKF and demonstrating stronger robustness.
- Asia > Japan (0.05)
- Asia > China > Sichuan Province > Chengdu (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- (21 more...)